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Abstract 

Background: Prediction models for old-age mortality have generally relied upon conventional markers such as 
plasma-based factors and biophysiological characteristics. However, it is unknown whether the existing markers 
are able to provide the most relevant information in terms of old-age survival or whether predictions could be 
improved through the integration of whole-genome expression profiles. 

Methods: We assessed the predictive abilities of survival models containing only conventional markers, only gene 
expression data or both types of data together in a Vitality 90+ study cohort consisting of n = 151 nonagenarians. 
The all-cause death rate was 32.5% (49 of 151 individuals), and the median follow-up time was 2.55 years. 

Results: Three different feature selection models, the penalized Lasso and Ridge regressions and the C-index 
boosting algorithm, were used to test the genomic data. The Ridge regression model incorporating both the 
conventional markers and transcripts outperformed the other models. The multivariate Cox regression model was 
used to adjust for the conventional mortality prediction markers, i.e., the body mass index, frailty index and cell-free 
DNA level, revealing that 331 transcripts were independently associated with survival. The final mortality-predicting 
transcriptomic signature derived from the Ridge regression model was mapped to a network that identified nuclear 
factor kappa beta (NF-kB) as a central node. 

Conclusions: Together with the loss of physiological reserves, the transcriptomic predictors centered around NF-kB 
underscored the role of immunoinflammatory signaling, the control of the DNA damage response and cell cycle, 
and mitochondrial functions as the key determinants of old-age mortality. 
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Background 

Human longevity has proven to be a complex trait, and 
the factors enabling survival to old age are diverse. A 
great deal of variation also exists in the state of health in 
which old age is attained; some individuals age with 
good cognitive and physical health, whereas others suf- 
fer from multimorbidity and disabilities in daily func- 
tioning. Nevertheless, a variety of biomarkers, such as 
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immunoinflammatory factors, endocrine mediators 
and indicators of functional capabilities and frailty, 
have been reproducibly demonstrated to be predictive 
of old-age survival in different populations [1-3]. In 
very old individuals, an elevated low-grade inflamma- 
tory state (inflammaging), which is a manifestation of 
immune aging, can be particularly useful for identify- 
ing those individuals at the greatest risk of mortality 
[1,4]. Indeed, elevated levels of conventional circulat- 
ing inflammatory markers, such as interleukin 1 recep- 
tor antagonist (IL-lra), IL-6, C-reactive protein (CRP) 
and tumor necrosis factor alpha (TNF-oc), are among 
the factors that have been reported to be predictive of 
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old-age mortality [5,6]. We have also recently identified a 
novel biomarker, circulating cell-free DNA, which in 
addition to reflecting the rate of systemic inflammation and 
tissue degeneration, predicts all-cause mortality in elderly 
individuals, independent of common risk factors, such as 
cardiovascular disease, dementia and diabetes [7] . Addition- 
ally, inflammaging has been regarded as a driver of the 
archetypal preconditions of old-age mortality, specifically 
neurodegeneration, muscle wasting and frailty [reviewed in 
[1,8]]. These observations indicate the central roles of 
the immune system and inflammatory functions in late-life 
survival. 

However, only certain immunoinflammatory functions 
can be captured by assessing circulating inflammatory 
markers and determining the numbers of distinct leukocyte 
subsets. It is also unclear whether these parameters yield 
complete pictures of the biological processes that are cru- 
cial in old-age mortality. Indeed, the predictions of out- 
comes in patients with certain types of cancer have recently 
been shown to benefit from the incorporation of gene ex- 
pression profiles into traditional clinical cancer risk scores 
[9,10]. In other potentially fatal conditions, such as stroke, 
the use of blood-based gene expression data in combin- 
ation with other disease-associated measurements has 
allowed for valid classifications for the disease etiology [11]. 
However, the value of combining gene expression micro- 
array data with traditional mortality predictors has not been 
evaluated in association with age-associated mortality. Like- 
wise, it is largely unknown whether the changes in the gene 
expression patterns reported for a plethora of age-regulated 
transcripts [12-14] are also related to late-life mortality. 

To address these issues, we sought to systematically 
determine the predictive performances of a wide array of 
conventional markers, whole-genome transcriptomic 
data and the combination of these data with regard to 
all-cause mortality. We observed that the Ridge regres- 
sion model, containing the body mass index (BMI) and 
frailty index (conventional predictors) together with nine 
transcripts related to immunoinflammatory processes, cell 
cycle control and mitochondrial functions yielded the best- 
performing final signature model in terms of discriminative 
power and goodness-of-fit. The network analysis of the 
mortality-associated transcripts revealed that their actions 
were largely mediated through nuclear factor kappa beta 
(NF-kB) signaling. Thus, in addition to demonstrating the 
usefulness of combining transcriptomic data with conven- 
tional markers in the assessment of late-life survival, our 
results provide novel insights into the transcriptomic land- 
scape preceding all-cause mortality in old age. 

Methods 

Study population 

The study population consisted of n = 151 nonagenarians 
(n = 106 women and n = 45 men) participating in the 



Vitality 90+ Study, which is an ongoing study of individ- 
uals aged 90 years and older who reside in the city of 
Tampere, Finland. The individuals in the current study 
population were born in 1920 and were recruited and 
characterized as in the previous Vitality 90+ study cohort 
[5,7]. A home-visiting trained medical student performed 
the blood tests, physiological measurements, interviews and 
performance tests. Written informed consent was obtained 
from each participant and the study protocol followed the 
guidelines of the Declaration of Helsinki. The all-cause 
mortality data (median follow-up time of 2.55 years) includ- 
ing the dates of death, were collected from the Population 
Register Center. The mortality rate during the follow- 
up was 32.5%; of thel51 individuals, 49 died and 102 
survived the follow-up period. There were no losses to 
follow-up. The study protocol was approved by the 
Ethics Committee of the Pirkanmaa Hospital District and 
the Ethics Committee of the Tampere Health Center. 

RNA extraction and whole-genome transcriptomic analysis 

The protocols for the leukocyte separation, RNA isola- 
tion and microarray analysis have been previously de- 
scribed [13,15]. Briefly, peripheral blood mononuclear 
cells (PBMCs) were extracted using Ficoll-Paque density 
gradients (Ficoll-Paque™ Premium, GE Healthcare Bio- 
Sciences AB, Uppsala, Sweden), after which the cells 
were stored at -70°C in RNAlater solution (Ambion 
Inc., Austin, TX, USA). Following RNA extraction 
(miRNeasy Mini Kit, Qiagen, Hilden, Germany) and 
amplification (Illumina TotalPrep RNA amplification 
Kit, Ambion Inc., Austin, TX, USA), the RNA was hy- 
bridized to a HumanHT-12 v4 Expression BeadChip 
(Cat no. BD-103-0204; Illumina, San Diego, CA, USA) 
and the chips were scanned using Beadscan (Illumina 
Inc., CA, USA). The qualities of the biotinylated com- 
plementary RNA products were assessed with the 
Agilent 2100 Bioanalyzer (Agilent Technologies Inc., 
Santa Clara, CA, USA). The validation of the micro- 
array expression data through qPCR was performed as 
previously described [13]. The microarray data are 
available in the GEO database (http://www.ncbi.nlm. 
nih.gov/geo/) under accession number GSE40366. 

The preprocessing of the microarray data was per- 
formed using the Chipster v2.8 software (http://chipster. 
csc.fi/) [16]. A box plot and density plots were con- 
structed and principal component analyses were per- 
formed to assess the quality of the data. Using the lumi 
pipeline, the background was corrected with the bgAd- 
just.affy package, and the data were quantile-normalized 
and log2-transformed to achieve normality. Background 
noise and poor-quality data were filtered out based on 
expression levels (fluorescence intensities); the probes 
showing expression values of <5 or >100 in more than 5 
(3.3%) samples per transcript were excluded from the 
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analysis. Ingenuity Pathway Analysis (IPA, Ingenuity 
Systems, Redwood City, CA, US) was used to generate 
the networks and the statistically significant canonical 
pathways to which the identified mortality-associated 
transcripts were mapped. The IPA network generation 
algorithm creates networks by combining molecules 
(transcripts) based on the maximization of their specific 
connectivity, which is assessed as their interconnected- 
ness relative to all molecules they are connected to in 
the Ingenuity Knowledge Base. The networks are ranked 
and scored based on the number of the Ingenuity Know- 
ledge Base Network Eligible molecules they contain. In 
the network images, direct molecular relationships are 
displayed with continuous lines and indirect relation- 
ships with dashed lines. The significance of the associ- 
ation between the dataset and the canonical pathway is 
measured by IPA in two ways: i) based on the ratio of 
the number of molecules from the dataset that map to 
the pathway divided by the total number of molecules in 
the given pathway and ii) thorough the calculation of a 
Benjamini-Hochberg (B-H)-corrected p-value for mul- 
tiple testing, which determines the probability that the 
association between the transcripts in the dataset and 
the canonical pathway is explained by chance alone. B-H- 
corrected p-values <0.05 (corresponding to 1.3 on a -log- 
scale) were considered to be statistically significant. 

Biochemical measurements and flow cytometry 

The methods used to measure the plasma cf-DNA 
levels, unmethylated cf-DNA levels, Alu repeat cf-DNA 
plasma mitochondrial copy numbers, and CRP, IL-6, and 
IL-10 levels have been previously described [15]. The 
techniques for the determination of the plasma levels 
of IL-1[3, IL-7, Cortisol, dehydroepiandrosterone sulfate 
(DHEAS), indoleamine 2,3-dioxygenase (IDO) activity and 
anti-Epstein Barr virus (EBV) and anti-cytomegalovirus 
(CMV) antibody titers are described in the Additional 
file 1. Immunosenescence was assessed as the propor- 
tions of CD4 + CD28- cells and CD8 + CD28- cells and 
the ratio of CD4+ and CD8+ cells. The flow cytometric 
analysis used for the determination of immune cell 
proportions has been described in detail elsewhere 
[17]. Briefly, PBMCs were labeled with FITC-CD14, 
PerCP-Cy5.5-CD3, APC-CD28 (eBioscience, San Diego, 
CA, USA), PE-Cy'"7-CD4 and APC-Cy~7-CD8 (BD 
Biosciences, Franklin Lakes, NJ, USA). The results 
were analyzed using the BD FACS Diva software, ver- 
sion 6.1.3 (BD Biosciences, Franklin Lakes, NJ, USA). 

Assessments of physiological characteristics, functional 
performance and frailty 

The techniques used to assess the anthropometric char- 
acteristics and functional performance, i.e., the Barthel 
index, handgrip and Mini-Mental State Examination 



(MMSE), have been previously described [5,18]. The ability 
to perform the chair- rise test (yes/no) was assessed as the 
ability to stand up once from a straight-backed, regular- 
height chair without the use of the arms, whereas the ability 
to perform the chair-stand test (yes/no) was assessed as the 
ability to stand up and sit down five consecutive times from 
a straight-backed, regular-height chair. The method for 
determining the frailty score, which was based on cri- 
teria outlined by Fried et al. [3], has been described 
elsewhere [15]. The frailty index for each individual 
was assigned based on the frailty score as follows: 
0 points = non-frail, 1-2 points = pre-frail, and 3-5 
points = frail. Blood pressure was measured in a sitting 
position using OMRON M4 Automatic Sphygmoman- 
ometer. The mean of two consecutive measurements 
was considered to be the final value. 



Statistical analyses 

The characteristics of the study population are presented 
in Table 1. Each parameter presented in Table 1 was 
tested for its association with mortality through a uni- 
variate Cox regression analysis. A multivariate Cox re- 
gression model was fitted for all significant univariate 
predictors in Table 2 (left side) using stepwise selection 
to eliminate non-significant variables at the p = 0.05 
level. The conventional variables predicting mortality in 
the Cox multivariate model were BMI, the frailty index 
and the cf-DNA level (Table 2, right side). An outline of 
the assessment procedure for the mortality-predicting 
signature is presented in Figure 1. 

For high-dimensional predictors, such as whole-genome 
transcriptomic data, the traditional Cox regression model 
cannot be directly applied. As a general rule of thumb, the 
Cox model should be used only when there are a mini- 
mum of 10 events per predictor variable (EPV), or at least 
5-9 EPV under certain circumstances [19]. Thus, we first 
used the Cox univariate selection method to test the 
mortality- associations of each of the 8,893 transcripts that 
passed the raw data preprocessing procedure. Statistical 
significance was set at p < 0.05; all transcripts passing this 
level were subjected to further modeling. Individual as- 
sessments of the transcripts revealed that 478 were signifi- 
cantly associated with survival. After individually adjusting 
these 478 transcripts for the conventional predictors 
(BMI, frailty index and cf-DNA level) in the multivariate 
Cox model, 331 transcripts remained significantly associ- 
ated with mortality (p < 0.05). We then performed dimen- 
sion reduction and feature selection using the Ridge and 
Lasso penalized regression models and the C-index boost- 
ing algorithm; all the significant 331 transcripts were in- 
cluded in the models without adjusting for multiple 
testing (please see the next paragraph for the model 
characteristics). 



Jylhava et al. BMC Medical Genomics 2014, 7:54 
http://www.biomedcentral.eom/1755-8794/7/54 



Page 4 of 1 1 



Table 1 Characteristics of the study population 



Non-survivors Survivors 



Variable 


Mean/Median 


SEM/IQR/% 


Mean/Median 


SEM/IQR/% 


Women (n/%) 


36 


73.5 


70 


68.6 


Age (months) 


1079.7 


0.52 


1079.9 


0.32 


Systolic blood pressure (mmHg) 


141 


3.75 


150 


2.92 


Diastolic blood pressure (mmHg)* 


70.5 


14.5 


74.5 


19.0 


Weight (kg) 


63.3 


1.96 


70.0 


1.32 


BMI (kg/m 2 ) 


24.8 


0.67 


27.1 


0.46 


Waist circumference (cm) 


89.4 


1.93 


94.0 


1.25 


Hip circumference (cm)* 


98.5 


11.0 


102 


11.5 


MMSE* 


24.0 


7.0 


26.0 


4.0 


Barthel index* 


95.0 


20.0 


97.5 


5.0 


Handgrip (kg)* 


18.0 


10.5 


20.0 


6.5 


Able to perform chair-rise test (n = yes/%) 


29 


63.0 


82 


80.4 


Able to perform chair-stand test (n = yes/%) 


32 


72.7 


87 


86.1 


Frailty index (n/%) 










Non-frail 


3 


6.1 


35 


34.3 


Pre-frail 


32 


65.3 


52 


51.0 


Frai 


14 


28.6 


15 


14.7 


CRP level (ng/ml)* 


2.2 


7.5 


2.0 


3.2 


IL-ip level (pg/ml)* 


14.4 


27.4 


20.9 


33.5 


L-6 level (pg/ml)* 


4.9 


3.1 


3.8 


3.9 


IL-7 level (pg/ml)* 


8.0 


4.5 


7.5 


5.4 


IL-10 level (pg/ml)* 


1.56 


1.23 


1.52 


1.62 


cf-DNA level (pg/ml)* 


0.92 


0.21 


0.87 


0.17 


Unmethylated cf-DNA level (pg/ml)* 


0.73 


0.20 


0.67 


0.16 


Plasma mtDNA (copy number)* 


4.27E 8 


2.68E 8 


3.64E 8 


2.32E 8 


Alu repeat cf-DNA (GE)* 


80.2 


62.7 


66.5 


38.3 


DHEAS (ug/ml)* 


0.25 


0.48 


0.24 


0.29 


Cortisol (ng/ml)* 


133 


55.9 


125 


60.9 


DO activity (Kyn/Trp)* 


52.7 


23.3 


50.8 


23.2 


Anti-CMV antibody titer 


19200 


1145 


19141 


830 


Anti-EBV antibody titer* 


410 


310 


385 


380 


CD3+ cells (%)* a 


60.9 


21.5 


57.0 


13.8 


CD4+ cells (%) b 


62.3 


2.38 


63.6 


1.42 


CD8+ cells (%) b 


31.0 


2.21 


29.2 


1.33 


CD4+/CD8+ cells (ratio)* 


2.29 


2.40 


2.29 


2.38 


CD4+CD28- cells (%)* c 


11.0 


17.0 


10.0 


12.0 


CD8+CD28- cells (%)* d 


65.2 


29.4 


69.1 


23.7 


CD14+ cells (%)* a 


9.5 


8.6 


9.5 


6.4 



*median value and IQR presented. 

percentage of live-gated cells, percentage of total T lymphocytes (CD3+ cells), 
percentage of CD4+ cells' percentage of CD8+ cells. 

Abbreviations: BMI body mass index, CD cluster of differentiation, CMV cytomegalovirus, CRP C-reactive protein, cf-DNA cell-free DNA, DHEAS dehydroepiandrosterone 
sulphate, EBV Epstein-Barr virus, GE genomic equivalent, IDO indoleamine 2,3-dioxygenase, IL interleukin, Kyn kynurenine, MMSE Mini-Mental State Examination, 
mtDNA mitochondrial DNA, Trp tryptophan. 
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Table 2 Mortality-predicting variables 





Univariate 




Multivariate 




HR (95% CI) 


P 


HR (95% CI) 


P 


Systolic blood pressure 


0.99 (0.98-1.00) 


0.039 






Diastolic blood pressure 


0.97 (0.95-1.00) 


0.031 






Weight 


0.97 (0.94-0.99) 


0.003 






BMI 


0.90 (0.84-0.97) 


0.004 


0.91 (0.85-0.97) 


0.007 


Hip circumference 


0.95 (0.92-0.99) 


0.010 






MMSE 


0.91 (0.87-0.95) 


<0.001 






Barthel index 


0.97 (0.96-0.99) 


<0.001 






Handgrip 


0.95 (0.91-0.99) 


0.010 






Able to perform chair-rise test (ref. = no) 


0.41 (0.23-0.73) 


0.002 






Able to perform chair-stand test (ref. = no) 


0.39 (0.22-0.71) 


0.002 






cf-DNA level 


5.17 (1.64-16.4) 


0.005 


3.82 (1.18-12.3) 


0.025 


Unmethylated cf-DNA level 


5.28 (1.62-17.2) 


0.006 






Frailty index (ref. = non-frail) 










P re-frail 


5.90 (1.80-19.3) 


0.003 


5.35 (1.63-17.6) 


0.006 


Frail 


8.46 (2.43-29.5) 


0.001 


6.29 (1.77-22.4) 


0.005 



Abbreviations: BMI body mass index, cf-DNA cell-free DNA, O confidence interval, HR hazard ratio, MMSE Mini-Mental State Examination. 

The variables predicting mortality in the Cox univariate assessment are presented on the left side of the table and the variables remaining as independent 

predictors in the stepwise Cox multivariate model are presented on the right side of the table. 



Using the 331 mortality-associated transcripts, we 
proceeded to test and utilize three different dimension 
reduction methods for feature selection. The Ridge re- 
gression model [20] shrinks the regression coefficients 
by imposing penalties on their squared values. Penalized 
maximum likelihood estimation in Cox regression with 
the Ridge penalty was introduced by Verweij and van 
Houwelingen [21], whereas Van Houwelingen et al. [22] 



proposed the use of the Cox model with a quadratic 
penalty to predict survival time based on transcriptomic 
data. The least absolute shrinkage and selection operator 
(Lasso) was introduced by Tibshirani [23]. Lasso shrinks 
regression coefficients toward zero by penalizing the 
sizes of the coefficients but uses absolute values instead 
of the squared values. Penalizing based on absolute 
values results in the number of estimated coefficients 



Univariate Cox 
regression to identify the 
conventional mortality 
predictors 



Univariate Cox 
regression to identify the 
mortality-predicting 
transcripts (n=478) 



Multivariate Cox 
regression to identify the 
independent 
conventional predictors: 
BMI, frailty index and cf- 
DNA level 



T 



Identification of the 
independent 
transcriptomic predictors 
by adjusting for BMI, 
frailty index and cf-DNA 
level (n=331) 



T 



Dimension reduction (feature selection) using 
penalized Ridge regression model containing the 
conventional markers and the transcriptomic data 

Cox multivariate model to identify the 
final mortality-predicting signature 



T 



Network analysis for the final signature 
transcripts to elucidate the underlying 
biology 



Pathway and network 
analyses for the 331 
transcripts 



Evaluation of the Lasso and 
Ridge penalized models and 
the C-index boosting algorithm 
containing the different 
variable combinations 

a) conventional markers 

b) transcriptomic data 

c) conventional markers and 
transcriptomic data together 

Assessment of model 
performance: 

- discriminative power 

- accuracy of predictive 
modeling 

- relative goodness of fit 



Figure 1 Outline of the assessment procedure for the mortality-prediction signature. 
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becoming exactly zero. When performing Lasso or Ridge 
regression, the tuning parameter (X) must be determined 
to control for the amount of shrinkage. The optimal 
value of \ can be estimated through cross-validation; we 
chose the tuning parameter by maximizing the 10-fold 
cross-validated log partial likelihood. After defining the 
optimal \, this value was used to obtain parameter esti- 
mates for the transcriptomic data-only model and the 
model containing the conventional predictors and the tran- 
scriptomic data (the combined model). The R package pe- 
nalized was employed for the Lasso and Ridge regression 
with the "unpenalized" argument for the conventional vari- 
ables in the combined model. As a third method, we tested 
the C-index boosting algorithm, which has been presented 
as an alternative means for the derivation of marker (gene) 
combinations via a gradient boosting framework and the 
direct optimization of the C-index [24]. 

We began the model selection process by evaluating 
the predictive performance of each model and the vari- 
able combinations through cross-validation, for which 
the data were split into training and test sets, and the 
differences in the deviance and an R 2 measure based on 
the Brier score Brier score (iRBS) were calculated (de- 
scribed in the Additional file 1). This was followed by 
the assessment of the selected model for its Akaike In- 
formation Criterion (AIC) and Harrell's C (also concord- 
ance index or C-index), which is a measure of the 
separation of two survival distributions [25]. The C- 
index is a performance characteristic for survival 
models, and it represents the fraction of all pairs of sub- 
jects whose predictions exhibit correct orders over the 
pairs that are able be ordered. The C-index estimates the 
probability that the order of the predictions of a pair of 
comparable subjects is consistent with their observed 
survival data. 

The cut-off point for the absolute values of the coeffi- 
cients was 0.0365 for the best-performing prediction 
model (Ridge regression with conventional predictors 
and transcriptomic data). All predictors showing a re- 
gression coefficient above the cut-off point were fitted to 
a multivariate Cox regression model using a stepwise se- 
lection method. The Cox model assumes proportional 
hazards, i.e., a log-linear relationship between the hazard 
rates and the independent parameters in the model dur- 
ing the follow-up period. The violation of the propor- 
tionality assumption during the follow-up period was 
assessed by extending the Cox model to incorporate 
time-dependent covariates representing the interactions 
between each of the independent parameters and the 
parametric function of the follow-up time. We also cal- 
culated the scaled Schoenfeld residuals for each inde- 
pendent parameter. Testing time-dependent covariates is 
equivalent to testing for a non-zero slope in a general- 
ized linear regression of scaled Schoenfeld residuals as a 



function of time. A non-zero slope is an indication of a 
violation of the proportional hazard assumption. Based 
on the global test, no evidence of a statistically signifi- 
cant dependence of mortality on time was observed (p = 
0.11). All Cox regression models were performed using 
the Stata software (version 13.0 for Windows, StataCorp 
LP, TX, USA). 

Results 

The distributions of all examined variables (i.e., the 
conventional markers) are presented in Table 1. The 
conventional markers that were observed to predict 
mortality in the univariate and multivariate Cox regres- 
sion models are presented in Table 2. Sex was not 
associated with mortality (p = 0.476) in Cox univariate 
regression in this cohort; thus, it was not included in the 
further models (please see Additional file 1 for the clarifica- 
tion behind this somewhat unexpected result). Likewise, 
age in moths was not associated with mortality (p = 0.654) 
in the Cox univariate regression model. 

The 478 transcripts displaying expression levels associ- 
ated with survival in the Cox univariate regression 
model are presented in Additional file 2: Table SI, and 
the 331 transcripts that remained as independent mor- 
tality predictors after adjustment for BMI, frailty index 
and cf-DNA level are presented in Additional file 3: 
Table S2. The top 10 canonical pathways to which these 
331 transcripts were mapped are presented in Table 3; 
these pathways exhibited a preponderance of various im- 
mune signaling functions. The top-ranked network that 
was generated via IPA from these 331 transcripts (IPA 
score = 38) consisted of the Cell death and Survival, In- 
flammatory Response and Cellular Function and Main- 
tenance functions (Additional file 4: Figure SI). The 
tested models, i.e., the Cox regression model containing 
the conventional markers alone and the three different 
feature selection models (the Lasso and Ridge regres- 
sions and the C-index boosting algorithm) were evalu- 
ated for their predictive accuracies (generalizabilities) 
using the deviance from the null model and iRBS. The 
evaluation criteria revealed that the model containing 
the conventional markers alone and the Ridge regression 
model containing both the conventional markers and 
transcriptomic data (i.e., the combined model) were 
superior to the other models, displaying the lowest 
median values for the deviance from the null model 
and the highest median values in the iRBS assessment 
(Additional file 5: Figure S2 and Additional file 6: 
Figure S3, respectively). In general, the other models in 
addition to the use of the transcriptomic data alone 
regardless of the model, performed poorly in the 
generalizability assessment (Additional file 5: Figures 
S2 and Additional file 6: Figure S3). 
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Table 3 The 10 most significant mortality-associated canonical pathways 


Ingenuity canonical pathway 


-log(p)* 


Ratio 


Transcripts 


LPS-stimulated MAPK Signaling 


2.97 


0.11 


NFKBIA, MAP2K2, PIK3C3, RAO, MAPK9, IKBKE, MAP2K3, ELK1, PRKCB 


CD28 Signaling in T Helper Cells 


2.68 


0.08 


CALM1 (includes others), NFKBIA, MAP2K2, PIK3C3, HLA-DRA, RAO, 
MAPK9, CD86, IKBKE, ARPC4 


B Cell Receptor Signaling 


2.33 


0.07 


CALM! (includes others), NFKBIA, MAP2K2, PIK3C3, RAO, MAPK9, IKBKE, 
MArzKj, INrrbK, tLKl, FHKLd 


CV)AC\ ^inna inn 

v^lj^u jiyi idling 


2 33 


0 1 0 


/VTAD/Ai, IV\r\r Zi\Z, Lir\ t rii\D\ J, lv\r\r(\y, lf\D!\E, IVir\r Z!\J 


Pyridoxal 5'-phosphate Salvage Pathway 


2.33 


0.11 


MAP2K2, PIMI, GRK6, CDK6, MAPK9, MAP2K3, IRAKI 


Natural Killer Cell Signaling 


2.33 


0.08 


KIR2DLI/KIR2DL3, KIR3DLI, MAP2K2, PIK3C3, RAO, INPP5K, KIR2DL4, SH2DIB, PRKCB 


L-1 Signaling 


2.22 


0.08 


TOLLIP, NFKBIA, MAPK9, IKBKE, MAP2K3, GNAI3, PRKARIA, IRAKI 


Salvage Pathways of Pyrimidine 
Ribonucleotides 


2.22 


0.09 


NME4, MAP2K2, PIMI, GRK6, CDK6, MAPK9, MAP2K3, IRAKI 


CD27 Signaling in Lymphocytes 


2.15 


0.11 


SIVAi, NFKBIA, MAP2K2, MAPK9, IKBKE, MAP2K3 


PI3K Signaling in B Lymphocytes 


2.15 


0.07 


CALMI (includes others), NFKBIA, MAP2K2, F0X03, RAO, IKBKE, PLEKHA1, ELKI, PRKCB 



*Benjamini-Hochberg -corrected p-value. 

The presented pathways are generated from the 331 transcripts that predicted mortality independent of BMI, frailty index and cf-DNA level. 



We next evaluated the goodness-of-fit (AIC) and discrim- 
inative power (Harrell's C) of the variable combinations, be- 
ginning with the model containing only the conventional 
markers and adding the Ridge regression-identified tran- 
scripts one-by-one (Additional file 7: Table S3). The rank- 
ings and regression coefficients for the combined Ridge 
regression model are presented in Additional file 8: Table 
S4. Notably, marked improvements were observed in the 
models' discriminative powers (from 71.1% to 85.7%) and 
goodness-of-fit values (from 449.5 to 391.1) up to the 
model no. 15 following the addition of the transcriptomic 
predictors (Additional file 7: Table S3). Thereto re, model no. 
15 was considered to be the final mortality signature. The 
stepwise Cox regression analysis of the final signature 
demonstrated that high expression levels of lymphotoxin 
alpha (LTA), NME/NM23 nucleoside diphosphate kinase 4 
(NME4) and growth arrest and DNA-damage-inducible 
beta (GADD45B) and low expression levels of myelin 
basic protein (MBP), SH2 domain containing IB (SH2D1B), 
checkpoint kinase 2 (CHEK2), leucine-rich repeats and 
calponin homology domain containing 3 (LRCH3), trans- 
membrane protein 70 (TMEM70) and vitamin K epoxide 
reductase complex, subunit 1 {VKORC1) together with a 
low BMI and increased frailty were the most predictive of 
mortality (Table 4). The highest-ranking IPA-generated net- 
work incorporated 7/9 of the final signature transcripts and 
consisted of the following functions: Cell Cycle, Cell Death 
and Survival, and Hematological System Development and 
Function (Figure 2). 

Discussion 

The prediction of mortality in very elderly individuals 
has traditionally relied upon markers reflecting immuno- 
inflammatory and endocrine functions and parameters 
involving physiological capabilities. In this study, we 



demonstrated that integrating gene expression data 
into a model containing these traditional predictors re- 
sults in the improved prediction of old-age mortality in 
terms of the discriminative power and goodness-of-fit 
of the model. However, among the tested feature selec- 
tion methods, only the Ridge regression model per- 
formed satisfactorily in the generalizability assessment; 
therefore, it was selected as the preferred method for 
survival-signature modeling. In addition to providing a 
means of avoiding overfitting, obtaining a parsimoni- 
ous variable set through the penalized Ridge regression 
was necessary to assess the relative strengths of the 
conventional markers and the transcripts in the final 
model. Among the conventional markers, both the 

Table 4 The final mortality-predicting signature assessed 



using the Cox multivariate regression 


model 








HR (95% CI) 


S.E. 


Z 


P 


BMI 


0.84 (0.77-0.91) 


0.03 


-4.22 


<0.001 


Frailty index (ref. = non-frail) 










P re-frail 


9.53 (2.70-33.6) 


6.12 


3.51 


<0.001 


Frail 


17.7 (4.61-67.9) 


12.14 


4.19 


<0.001 


TMEM70 


0.39 (0.18-0.84) 


0.15 


-2.40 


0.017 


GADD45B 


2.60 (1.02-6.62) 


1.24 


2.00 


0.045 


NME4 


1.77 (1.11-2.80) 


0.41 


2.42 


0.015 


MBP 


0.58 (0.36-0.93) 


0.14 


-2.27 


0.023 


CHEK2 


0.26 (0.12-0.55) 


0.10 


-3.50 


<0.001 


VKORO 


0.33 (0.18-0.59) 


0.10 


-3.68 


<0.001 


LRCH3 


0.47 (0.25-0.87) 


0.15 


-2.40 


0.016 


LFA 


2.09 (1.37-3.19) 


0.45 


3.43 


0.001 


SH2DIB 


0.52 (0.36-0.76) 


0.10 


-3.44 


0.001 



Abbreviations: BMI body-mass index, CI confidence interval, HR hazard ratio, 
S.E., standard error. 
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Figure 2 The top-ranked IPA-generated network based on the 331 mortality-associated transcripts. The expression levels of these 
transcripts predicted mortality independent of BMI, frailty index and cf-DNA level. The molecules in the network are enriched for the following 
functions: Cell Death and Survival, Inflammatory Response and Cellular Function and Maintenance. Green color indicates that low expression level 
of the transcript predicts mortality, whereas red color indicates that high expression level of the transcript predicts mortality. 



BMI and the frailty index remained in the final model, 
indicating that the concomitant loss of physiological 
reserves in multiple homeostatic systems is detrimen- 
tal to survival. However, the cf-DNA level, which ap- 
pears to be the best plasma-based mortality predictor 
according to current data as well as our earlier Vitality 
90+ cohort [7], was replaced by the transcripts in our 
final model, suggesting that the information captured 
by the cf-DNA level overlaps with and is better 
reflected by the transcript expression levels. The find- 
ing that none of the traditional markers of inflamma- 
ging or of T cell immunosenescence were predictive of 
mortality was somewhat unexpected. However, the 
final signature transcripts LTA (high expression) and 
SH2D1B, MBP and LRCH3 (low expression) were as- 
cribed to immunoinflammatory processes. Specifically, 
LT-a (the protein product of LTA), which is a member 
of the TNF superfamily that plays pivotal roles in the 
function and development of the immune system, has 
been regarded as a central player in various inflamma- 
tory conditions [26]. The adapter molecule SH2D1B 
(also known as EAT-2) is known to play an indispens- 
able role in natural killer (NK) cell activation and cyto- 
toxicity, and it also enhances antigen-specific immune 
responses [27]. For MBP of immune cell origin (termed 



Golli-MBP), the only function demonstrated thus far is 
the negative regulation of T cell activation through the 
inhibition of Ca 2+ influx; the ablation of Golli-MBP 
leads to T cell hyperproliferation in the first phase but 
may subsequently trigger T cell anergy [28]. The func- 
tions of LRCH3 are poorly understood, although a 
recent study identified it as a TNF-a, IL-1(3 and EBV 
latent membrane protein 1-dependent upstream regu- 
lator of NF-kB activity [29]. Interestingly, the IPA- 
generated networks from the final signature transcripts 
(Figure 2) and the 331 mortality-predicting transcripts 
(Additional file 4: Figure SI) also displayed NF-kB as a 
central node, underscoring the role of NF-KB-mediated 
immunoinflammatory regulation in late-life mortality. 
Finally, as 8/10 of the mortality-associated pathways 
(Table 3) were assigned to functions involving adaptive 
and innate immunity, it appears that pervasive immunoin- 
flammatory dysregulation at the transcriptomic level pre- 
cedes old-age mortality, regardless of its cause. 

The final signature transcripts GADD4SB (elevated), 
CHEK2 (decreased), TMEM70 (decreased) and NME4 
(elevated) demonstrated that the control of the DNA dam- 
age response, apoptosis and cellular maintenance, including 
mitochondrial functions, were likewise essential to mortal- 
ity. In addition to serving as a crucial regulator in of 
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immune cell differentiation and cytokine production, 
GADD45|3 is upregulated under conditions of growth ar- 
rest due to cellular (genotoxic) stress and DNA damage 
[30]. The cell cycle checkpoint regulator CHEK2 is like- 
wise involved in the DNA damage response by preventing 
the entry into mitosis following DNA damage [31]. Inter- 
estingly, CHEK2 also has a crucial role in the assembly of 
the mitotic spindle and the maintenance of chromosomal 
stability [31]. Both NME4 (also known as NDPK-D) and 
TMEM70 are localized to the mitochondrion. NDPK-D 
plays a role in the supply of nucleotides and also puta- 
tively acts in apoptosis though cardiolipin transfer [32], 
whereas TMEM70 is required to maintain the activity of 
ATP synthase [33]. In addition, the finding that the IPA- 
generated network based on the 331 mortality-predicting 
transcripts (Additional file 4: Figure SI) included the 
Cell Death and Survival and Cellular Function and 
Maintenance functions, further recapitulates the rele- 
vance of these processes in old-age mortality. Interest- 
ingly, the final signature also incorporated VKORC1 
(decreased), which is an enzyme that aids in the main- 
tenance of hemostasis through the conversion of vita- 
min K to its active from. However, the significance of 
VKORC1 expression in immune cells in relation to 
mortality is ambiguous. One plausible link could be 
the role of vitamin K as a cofactor in posttranslational 
protein modification, leading to the production of 
y-carboxyglutamate/vitamin K-dependent (VKD) pro- 
teins. Indeed, the VKD protein GAS6 has been demon- 
strated to play a role in leukocyte migration and 
proliferation, phagocytosis and apoptosis [34]. Alterna- 
tively, the cofactor-independent immunomodulatory 
activities of vitamin K might account for this finding 
because vitamin K has been shown to downregulate 
the production of certain proinflammatory cytokines - 
an effect potentially mediated through NF-kB [35]. 

In addition to the present study, one previous study 
performed a penalized regression analysis (Lasso) to pre- 
dict age-associated mortality using transcriptomic data 
from cultured lymphoblastoid cell lines [36]. Despite the 
apparent differences in the settings of these studies, the 
cellular functions represented by the top-ranking tran- 
scripts were similar. For example, the most significant 
survival-associated transcripts found in the study by 
Kerber et al, [36] were COR01A, IQGAP1, AURKB, 
TERF2IP and CBX5, which play roles in processes such 
as T-cell mediated immunity, mitochondrial apoptosis, mi- 
tosis and chromatin maintenance. However, a between- 
study comparison of the survival-associated transcripts 
(Additional file 2: Table SI for our data) revealed only 12 
common transcripts, of which four {CYBSB, IQGAP1, 
TERF21P and UBEV2) exhibited Z-scores of the same irec- 
tionality. This discrepancy could be at least partially due to 
the differences in the cell types used in these investigations, 



which were in vivo blood mononuclear cells in our study 
but were cultured and transformed B cells in that of Kerber 
etal. [36]. 

Using another type of approach, van den Akker et al. 
[37] conducted a meta-analysis on established aging- 
associated transcripts and identified a protein-protein 
interaction module (Module F) consisting of 33 tran- 
scripts, whose mean expression was also associated with 
old-age survival in the Leiden Longevity Study [37]. In 
accordance with our findings, this module contained 
transcripts involved in mitochondrial functions (e.g., 
MTERF, ACADM and TFB2M) and the regulation of the 
cell cycle and mitosis (e.g., BUB3, APPBP1 and CDC23). 
Four age-associated transcripts in Module F (BUB3, 
APPBP1INAE1, TFB2M and HNRPR) were also observed 
in our dataset (Additional file 2: Table SI), all of which 
exhibited downregulated expression associated with an 
increased risk of mortality. Hence, it appears that the cellu- 
lar functions that are most robustly associated with old-age 
survival are similar in different populations but can never- 
theless be captured thorough different approaches. 

Overlaying the mortality-associated pathways (Table 3) 
with the pathways previously reported to be regulated by 
age in our study population (Additional file 7: Table S3 in 
[13]) revealed commonalities with CD28 Signaling in T 
Helper Cells, B Cell Receptor Signaling, CD40 Receptor Sig- 
naling and PI3K Signaling in B Lymphocytes. However, the 
overlap in the transcripts themselves was negligible; only 
11 {ADM, FAM46C, GRAP, HIST2H2AA4, IER2, IER3, 
NACC2, NLRP3, RORA and SOCS3) were both mortality- 
associated and age-regulated. A similar phenomenon was 
observed in the comparison of our mortality-associated 
transcripts with previously reported age-regulated tran- 
scripts [12,14]. Furthermore, some of the transcripts that 
were both mortality- and age-associated exhibited discrep- 
ancies in their direction of expression. For example, the ele- 
vated expression of GRAP was associated with an increased 
risk of mortality (Additional file 2: Table SI), whereas the 
downregulated expression of this transcript was associated 
with increased age [12,13]. These findings raise the ques- 
tion whether some of the reported age-regulated gene ex- 
pression changes that have been deemed unfavorable 
merely because they were associated with aging are in fact 
intentional and advantageous in the aged body. In this sce- 
nario, deviation from this optimal gene expression pattern 
in the opposite direction would lead to cellular disturbance, 
which would be relevant to mortality. Another noteworthy 
observation is that one well-known life span regulator, the 
mTOR pathway [38], did not emerge in our pathway ana- 
lysis, although a few individual components of this 
pathway {PIK3C3, PRKCB, RAC1 and RPS6KA1) were 
present among the 331 mortality-associated transcripts. 
Thus, we hypothesize that the significance of mTOR- 
mediated cellular regulation subsides in the later phase of 
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life, once an individual has already reached very old age. 
Overall, our results suggest that while the majority of the 
decisive processes are likely to be common in ageing and 
mortality, the actual driver genes underlying these phenom- 
ena may differ. 

The major limitations of the current study are the lack of 
an external validation cohort, which would be ideal for 
assessing the universality of the transcriptomic predictors, 
and the small sample size. In addition, because all subjects 
were homogeneous in terms of age (90 years), we were un- 
able to determine whether the identified predictors perform 
similarly in individuals of other (old) ages. 

Conclusions 

Taken together, our systematical characterization of the 
determinants of old-age mortality underscores the joint 
impact of the decline in physiological reserves, the fidel- 
ity of immunoinflammatory processes and the control of 
the DNA damage response, cell cycle and mitochondrial 
functions. In addition, our findings corroborate the pro- 
posed roles of NF-kB in the aging process and aging- 
related degeneration [39], and indicate that this protein 
complex is central to the mechanisms underlying late- 
life survival. We further conclude that the incorporation 
of gene expression data into a model with conventional 
predictors could contribute to the understanding of 
the mechanisms underlying old-age mortality. However, 
because cohorts including both genome-wide transcrip- 
tomic data and mortality follow-ups are currently scarce, 
further studies are necessary to ascertain the universality 
of our results. 

Additional files 



prominent mediator of the molecular interconnections. The molecules 
incorporated into the final signature are shown in enlarged bold font, 
and the connective molecules are shown in regular font. Green color 
indicates that low expression level of the transcript predicts mortality, 
whereas red indicates that high expression level of the transcript predicts 
mortality. 

Additional file 7: Table S3. Displaying the stepwise assessment of the 
variable combinations for the final Cox regression model. 

Additional file 8: Table S4. Displaying the results of the Ridge 
regression model performed with the conventional predictors and 
transcriptomic data (the combined model). 
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